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ConceFT: CONCENTRATION OF FREQUENCY AND TIME VIA 
A MULTITAPERED SYNCHROSQUEEZED TRANSFORM 

INGRID DAUBECHIES, YI (GRACE) WANG, AND HAU-TIENG WU 


Abstract. A new method is proposed to determine the time-frequency con¬ 
tent of time-dependent signals consisting of multiple oscillatory components, 
with time-varying amplitudes and instantaneous frequencies. Numerical exper¬ 
iments as well as a theoretical analysis are presented to assess its effectiveness. 


1. Introduction 

Oscillatory signals occur in a wide range of fields, including geophysics, biology, 
medicine, finance and social dynamics. They often consist of several different os¬ 
cillatory components, the nature, time-varying behavior and interaction of which 
reflect properties of the underlying system. In general, we want to assess the num¬ 
ber, strength and rate of oscillation of the different components constituting the 
signal, to separate noise from signal, and to isolate individual components; efficient 
and robust extraction of this information from an observed signal will help us better 
describe and quantify the underlying dynamics that govern the system. For each of 
the quantities of interest listed, we thus want an estimator that is consistent, that 
has (ideally) small variance and that produces results robust to different types of 
noise. 

If the observed signal / can be written as a finite sum of so-called harmonic 
components, i.e. f{t) = cos(27r^£t -h Si), where ai > 0 (respectively ^i > 0) 

represents the strength or amplitude (respectively frequency) of the f-th compo¬ 
nent, then one can recover the ai and ^i from time-samples of f{t) via the Fourier 
transform / of /, defined by /(^) := f /(t)e“*^’^^*dt. (If the ^i are all integer mul¬ 
tiples of a common l/tg, then the integral can be taken over an interval of length 
to] when this is not the case, one can resort to integrals over long time intervals and 
average. Typically only discrete samples f{tn), n € Z, are known, rather than the 
continuous time course f{t), t G K, and the integrals are estimated by quadrature.) 
However, oscillatory signals of interest often have more complex behavior. We shall 
be interested in particular in signals that are still the combination of “elementary” 
oscillations, but in which both the amplitude and the frequency of the components 
are no longer constant; they can be written as 

K 

(1) f{t) = ^ Ak{t) cos(27r(/?fc(t)), 

k=l 

where K G N, Ak{t) > 0 and > 0 for all k, but Ak{t) and (/^^(t) are not 

constants. One can compute the Fourier transform / of such signals, and recover / 
from / (this can be validly done for a much wider class of functions), but it is now 
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less straightforward to determine the time-varying amplitudes Ak(t) and the so- 
called “instantaneous frequencies” from /. Although the time-local behavior 

of the oscillations, and their deviation from perfect periodicity, cannot be captured 
by the Fourier transform in an easily “readable” way, an accurate description of 
this instantaneous behavior is nevertheless important in many applications, both 
to understand the system producing the signal and to predict its future behavior. 
Examples in the medical field include studies of the circadian PH [S2] and cortical 
rhythms (62] , or of heart-rate [HEaiis] and respiratory variability [63I491E], all 
widely studied to understand physiology and predict clinical outcomes. 

The last 50 years have seen many approaches, in applied harmonic analysis and 
signal processing, to develop useful analysis tools for signals of this type; this is the 
domain of time-frequency (TF) analysis. Several algorithms and associated theories 
have been developed and widely applied (see, e.g., the overview [19]); well known 
examples include the short time Fourier transform (STFT), continuous wavelet 
transform (CWT) and Wigner-Ville distribution (WVD). The main idea is often 
to “localize” a portion of the signal in time, and then “measure” the oscillatory 
behavior of this portion. For example, given a function / G the windowed or 
short time Fourier transform (STFT) associated with a window function h{t) can 
be defined as: 

V}^\t,ri) := j /(s)/i(t-s)e-*2’^''(*-")ds 

where t G M. is the time, rj G M"*" is the frequency, h is the window function chosen 
by the user - a commonly used choice is the Gaussian function with kernel band¬ 
width cr > 0, i.e. h{t) = (27rcr)“^/^e“* \ (The overall phase factor is 

not always present in the STFT, leading to the name modified short time Fourier 
transform (mSTFT) for this particular form in |57j.l 

Other, more specialized methods, targeting in particular signals of type 0. 
include the empirical mode decomposition |28j . ensemble empirical mode decom¬ 
position 1^, the sparsity approach (SH, iterative convolution- filtering EH Ell, 
the approximation approach |10j . non-local mean approach [21) . time-varying au¬ 
toregression and moving average approach m as well as the synchrosqueezing 
transforms introduced and studied by some of us m HI [Ml El 

All TF methods that target reasonably large classes of functions (as opposed 
to functions with such specific models that complete characterization requires only 
fitting a small number of parameters) must face the Heisenberg uncertainty prin¬ 
ciple, limiting how accurately oscillatory information can be captured over short 
time intervals; for toy signals specially designed to have precise TF properties (e.g., 
chirps), this typically expresses itself by a “blurring” or “smearing out” of their TF 
representation, regardless of the analysis tool used. Reassignment methods (3011313, 
introduced in 1978 and recently attracting more attention again, were proposed to 
analyze and possibly counter this. Their main idea is to analyze the local behavior 
in the TF plane of portions of the representation, and determine nearby possible TF 
concentration candidates that best explain it; each small portion is then reallocated 
to its “right” place in the TF plane, to obtain a more concentrated TF representa¬ 
tion that, one hopes, gives a faithful and precise rendering of the TF properties of 
the signal. Reassignment methods can be applied to very general TF representa¬ 
tions 13113; they can be adaptive as well |3]- It has been argued recently El] that 
reassignment methods can be viewed as analogs of “non-local means” techniques 
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commonly applied in image processing; this provides an intuitive explanation for 
their robustness to noise. 

The synchrosqueezing transform (SST) can be viewed as a special reassignment 
method Eiiziia. In SST, the STFT or CWT coefficients are reassigned only in the 
frequency “direction” [131 EH ES] ; this preserves causality, making it possible to 
reconstruct each component with real-time implementation |9]. The STFT-based 
SST of / is defined as 

:= lim J 

where ga is an “approximate ^-function” (i.e. g is smooth and has fast decay, with 
f g(x)dx = 1, so that ga(t) ■= tends weakly to the delta measure 6 as 

a —>■ 0), and with defined by 

—id v) 

:= -^— if Oj and := —oo otherwise. 

^ 2ttV^ ^ 

The SST for CWT is defined similarly; see [T31IH1) or Section]^ SST was proposed 
originally for sound signals [13 E]; its theoretical properties have been studied 
extensively [13 IMl 13 Ell 13113113 ) including its stability to different types of noise 
[56l [8] . Several variations of the SST have been proposed [33 Ell [l3 E3 113 i ia 
particular, the SST-approach can also be used for other TF representations, such as 
the wave packet transform 113, and it can be extended to two-dimensional signals 
(such as images) [73111] . In addition, its practical usefulness has been demonstrated 
in a wide range of fields, including medicine [miMl E3 im [6313 [Ml [13 , mechanics 
[13 [13113, finance [23 [13 , geography [6ll [23113 , denoising [13 , atomic physics 
[5511501 [53 and image analysis [13 [13 • 

The SST approach can extract the instantaneous frequency and reconstruct the 
constitutional oscillatory components of a signal of type Q in the presence of noise 
[13 [3- However, its performance suffers when SNR gets low: as the noise level 
increases, and even before it completely obscures the main concentration in the TF 
plane of the signal, spurious concentration areas appear elsewhere in the TF plane, 
caused by correlations introduced by the overcomplete STFT or CWT analysis tool. 
The effect of these misleading perturbations, which downgrade the quality of the 
results, can be countered, to some extent, by multi-tapering. 

Multi-tapering is a technique originally proposed to reduce the variance and 
hence stabilize power spectrum estimation in the spectral analysis of stationary 
signals [13 [23 El- Sampling the signal during only a finite interval leads to arti¬ 
facts, traditionally reduced by tapering; an unfortunate side effect of tapering is 
to diminish the impact of samples at the extremes of the time interval. Thomson 
[58] showed that one can nevertheless exploit optimally the information provided 
by the samples at the extremities, by using several orthonormal functions as ta¬ 
pers: the average of the corresponding power spectra is a good estimator with 
reduced variance. This technique has since been applied widely [13 [2311311 [ 73 . 
Multi-tapering was later extended to non-stationary TF analysis by combining it 
with reassignment [7I1123E7]: a more robust “combined” reassigned TF represen¬ 
tation is obtained by picking orthonormal “windows” (used to isolate portions of 
the TF representation when working with a reassignment method), and averaging 
the reassigned TF representations determined by each of the individual windows. 
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Heuristically, the concentration for a “true” constituting component of the signal 
will be in similar locations in the TF plane for each of the individual reassigned 
TF representations, whereas the spurious concentrations, artifacts of correlations 
between noise and the windowing function, tend to not be co-located and have a di¬ 
minished impact when averaged. In the SST context a similar multi-taper idea was 
used by one of us in a study of anesthesia depth |42l [38] , in which J different win¬ 
dow functions hj, j = 1,..., J are considered, and the multi-taper SST (MTSST) 
is computed as follows: 



Using multiple tapers reduces artifacts, and the MTSST remains “readable” at 
higher noise levels than a “simple” SST jH] |35| . To optimally suppress noise arti¬ 
facts it is tempting to consider increasingly larger J. However, the area in the TF 
plane over which the signal TF information is “smeared out” also increases (lin¬ 
early) with J, and a balance needs to be observed; in the multi-taper reassignment 
method of [HI, for instance, 6 Hermite functions were used (i.e. J = 6.) 

In this paper, we introduce a new approach to obtain better concentrated time- 
frequency representations, which we call ConceFT, for Concentration in Frequency 
and Time. It is based on STFT- or CWT-based SST, but the approach could be 
applied to yet other TF decomposition tools. The ConceFT algorithm will be 
defined precisely below, in Section]^ Like MTSST, ConceFT starts from a multi¬ 
layered time-frequency representation, but instead of averaging the SST results 
obtained from STFT or CWT for orthonormal windows, which can be viewed as 
elements in a vector space of time-frequency functions, it considers many different 
projections in this same vector space, and averages the corresponding SSTs; for 
more details, see Section]^ Section [^studies the theoretical properties of ConceFT, 
and explains how it can provide reliable results under challenging SNR conditions; 
finally, in Section]^ we provide several numerical results. 

To conclude this introduction, we illustrate ConceFT on a simulated signal, in 
which the clean signal s{t) is composed of two oscillatory components: s{t) = si(<)-|- 
S 2 (t), where Si(t) = Ai{t) cos{2TTipi{t))x['3,i2]it), ands 2 (t) = A 2 (<) cos(27ri^2(t))X[o,8](i) 
(here x stands for the indicator function, X[a,b]{i) = 1 if a < t < 5, X[a,b]ii) = 0 
otherwise); Ai{t) > 0 and </j'(t) > 0 for i = 1,2. This signal is sampled at rate 
lOOHz, from < = 0 to t = 12 seconds. To these signal samples we add independent 
realizations of a fat-tailed noise which is identically-t4-Student-distributed with 
variance 2.036. The left panels in Figureshow the three constituents of the total 
(noisy) signal Y{t) = si(t) -l-S 2 (t) +^{t); note that each of si and S 2 “lives” during 
only part of the full time observation interval; the fat-tailed nature of the noise 
causes the bursty behavior evident in the plot of ^(t). The individual plots of the Si 
show the amplitude modulations Ai{t) of the s^; Figurealso graphs :/?((<) > 0 for 
i = 1,2. In addition. Figure shows the time course of both the clean signal s{t) 
and the noisy signal Y{t), at the same scale; their signal-to-noise ratio is —0.85, 

computed as 201ogig ^ std({(t)) ) ’ where std stands for standard deviation. Figure 
shows several SST-based results for this (quite challenging) example. For the clean 
signal s, the “mono-SST” (STFT-based, with a Gaussian window) performs quite 
well, with only some artifacts at the onset and cessation of the s^; many structured 
artifacts are visible in the mono-SST of the noisy signal Y. Both MTSST and 
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ConceFT remove the onset and cessation artifacts for the clean s (shown only for 
ConceFT in the figure, but similar for MTSST); the improvement is much more 
marked for the noisy signal Y: the spurious “bubbles” are suppressed to some ex¬ 
tent in the MTSST-based representation (using 2 orthonormal windows: the same 
Gaussian and the next higher-order Hermite function); a more dramatic improve¬ 
ment is seen in the ConceFT-representation corresponding to the same vector space 
of windows. 



Time (sec) 


Time (sec) 


Figure 1. Left panels: the three constituents of the noisy signal 
Y{t): oscillatory components si(t) (top), and S 2 {t) (middle), and 
the bursty iid t4-Student noise ^{t) (bottom). Note that Si(t) 7 ^ 0 
only for t > 3, S 2 (t) 7 ^ 0 only for t < 8 sec.; their respective 
amplitudes Ai{t) are plotted as envelopes for each. Right panels: 
plots of (solid) and (dashed) in top panel; the clean 

signal s = Si -I- S 2 (middle) and noisy signal, Y{t) = s{t) -I- ^(t) 
(bottom), plotted with the same scale. 


2. The ConceFT algorithm 

We start by briefly reviewing SST. In the introduction, we deflned STFT-based 
SST, discussed in more detail in [63l[55]; to show that the situation is very similar 
for CWT-based SST, we discuss that case here; see miHi for details. We start 
with the wavelet '0 with respect to which the CWT will be computed, which must 
necessarily have mean zero; that is, f 0(<) dt = 0; let’s also pick it to be a Schwartz 
function. We shall assume that we are dealing with real signals /; in this case the 
symmetry in ^ of /(^) makes it possible to consider only the “positive frequency 
part” of /, by picking 0 so that its Fourier transform 0 is supported on K_|_. (The 
approach can be extended easily to handle complex signals as well, but notation 
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Figure 2. Top left: STFT-based synchrosqueezing transform 
(SST) of the clean signal s{t) for a Gaussian window h; middle 
left: STFT-based SST of the noisy signal Y{t), with the same win¬ 
dow. Bottom left: multi-taper SST of y(f), choosing the Gaussian 
and the next 5 Hermite functions as windows; this result is closer 
to the STFT-based SST of s. Top right: GonceFT of s{t) based on 
the same two Hermite functions; middle right: GonceFT of Y{t) 
based on the two Hermite functions; bottom right: same as middle 
right, with plots of and superimposed. 


becomes a bit heavier.) Then the Continuous Wavelet Transform wj'^\a,b) of 
a tempered distribution /, with the variables a, b standing for scale and time 
location, is defined as the inner product of / with — b)/a). 

Even if the Fourier transform / is very concentrated around some frequency wq, 
the magnitude \ b)\ of the CWT will be spread out over a range of scales a, 

corresponding to a neighborhood of wq. However, the phase information of 

will still hold a “fingerprint” of luq on that whole neighborhood, in that b) 

will show oscillatory behavior in 6, with frequency wq, for a range of different a. 
This is the motivation for the synchrosqueezing transform, which shifts the CWT 
coefficients “back”, according to certain reassignment rules determined by the phase 
information. More concretely, we set a threshold T > 0, and then define 


n 


WT) 

/ 


{a,b) 


-idbWf\a,b) 
2TTW^j^\a, b) 


when \ W^^\a,b)\ > F 
when \ wj:'^\a,b)\ < F. 


— CX) 
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where db is the partial derivative with respect to b (see the Electronic Supplementary 
Materials - or ESM - for a remark concerning robust numerical implementation); 
the hard threshold E can be adjusted for best reduction of the numerical error and 
noise influence. The CWT-based synchrosqueezed transform (or CWT-based SST) 
then moves the CWT coefficient 6) to the “right” frequency slot, using 

as guideline: 


n(tp,r,a) 




/{a: Hv)'^>(a,6)|>r} « 


1 




(a, b)' 


-3/2da 


a 


where 0 < a ^ 1 is chosen by the user, g is a smooth function so that ^g{-) —>■ <5 
in the weak sense as a —>■ 0, and the factor is introduced to ensure that the 

integral of S^^’^’°‘\b,^) over ^ yields a close approximation to the original f{b). 
For more details, we refer the reader to [HIH]. 

Although both the CWT and its derived SST depend on the choice 

of the reference wavelet ip, this is much less pronounced for the SST; CWT-based 
SST corresponding to different reference wavelets lead to different but very simi¬ 
lar TF representations. (Theoretical reasons for this can be found in [iSlIH].) In 
particular, the dominant components in the TF representations are very similar. 
Moreover, even when the signal is contaminated by noise, these dominant compo¬ 
nents in the TF representations are not significantly disturbed [S]. However, the 
distribution of artifacts across the TF representation, induced by the noise, as seen 
in e.g. the middle left panel of Figure vary from one reference wavelet to an¬ 
other; this can be intuitively explained by observing that the CWT is essentially 
a convolution with (scaled versions of) the reference wavelet, so that the wavelet 
transforms of i.i.d. noise based on different orthogonal reference wavelets are inde¬ 
pendent. These observations lead to the idea of a multi-taper SST algorithm [i^l55] . 
In brief, given J orthonormal reference wavelets ipj, j = 1,..., J, one determines 
the reassignment rules as well as the corresponding and 

then defines the MTSST by 


MS^’“(6,e) := 

This suggests that averaging over a large number of orthonormal reference wavelets 
would smooth out completely the TF artifacts induced by the noise, as originally 
discussed for the reassignment method m- However, in order for reassignment to 
make sense, the reference function, whether it is the window h for STFT or the 
wavelet ip for CWT, must itself be fairly well concentrated in time and frequency, 
so that inner products with modulated window functions or scaled wavelets do 
not mix up different components and behaviors of the signal. On the other hand, 
there is a limit to how many orthonormal functions can be “mostly” supported in a 
concentrated region in the TF-plane - by a rule of thumb generalizing the Nyquist 
sampling density one can find, for a region TZ in the TF-plane, only Area(7^)/(27r) 
orthonormal functions that are mostly concentrated on TZ |12j . This limits how 
many different orthonormal ipj can be used in MTSST. 

ConceFT uses the different TF “views” provided by the CWT transforms 
in a different way, exploiting the non-linearity of the SST operation. (See the ESM 
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for a sketch of an alternate way in which one could extend multi-taper CWT, not 
pursued in this paper, however.) For each choice of '0, the collection of CWT 
where / ranges over the class of signals of interest, span a subspace of the space 
of all reasonably smooth functions of the two variables a, b. Different orthonormal 
generate different subspaces in combined, they generate a larger subspace, 
in which one can define an infinite number of “sections”, each corresponding to the 
collection of CWT generated by one reference wavelet. Each linear combination of 
the ipj defines such a CWT-space, in which one can carry out the corresponding SST 

operation. For i/; = J2j=i ’’f V'j) where Vj G K, one has = X]/=i C be¬ 

cause synchrosqueezing is a highly nonlinear operation, the corresponding 

are however not linear combinations of the in practice, the artificial con¬ 

centrations in the TF-plane, triggered by fortuitous correlations between the noise 
and the (overcomplete) occur at locations sufficiently different, for differ¬ 

ent choices of the vector r = (ri,... ,rj), that averaging over many choices of r 
successfully suppresses noise artifacts. 

More precisely, the CWT-based ConceFT algorithm proceeds as follows: 


Take J orthonormal reference wavelets, i/ii,..., i/ij, in the Schwartz space, 
with good concentration in the TF-plane. 

Pick N random vectors r„, n = 1,...,A^, of unit norm, in that is, 
uniformly select N samples in 




and = 


• For each n between 1 and N, define 

• Select the threshold F > 0 and the approximation parameter a > 0, and 

evaluate, for each n between 1 and N, the corresponding CWT-based 
SST of / by computing the reassignment rule and hence 

^), as defined above, with the minor adjustment that when 

the expression in the reassignment rule denomina¬ 

tor has a negative real part, we switch to the vector —r„. 

• The final ConceFT representation of / is then the average 


( 2 ) 


N 


cl" 




,r,a) 


ib,0- 


n—1 


In practice, J could be as small as 2, while N could be chosen as large as 
the user wishes. 

The square of the magnitude of ^), 




can be of interest in its own right, as an estimated time-varying Power Spectrum 
(tvPS) of /. 

STFT-based ConceFT representations are defined entirely analogously, based on 
the STFT-reassignment rule given in Section 
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3. Theoretical Results 

In this section, we list and explain theoretical results about CWT-based ConceFT. 
The detailed mathematical computations and proofs can be found in the ESM. En¬ 
tirely similar results hold for STFT-based ConceFT; since they are established by 
the same arguments, we skip those details. We start by recalling the structure of 
our signal space, as introduced in We emphasize that this is, to a large 

extent, a purely phenomenological model, constructed so as to reflect the fairly 
(but not exactly) periodic nature of many signals of interest, in particular (but not 
only) those of a physiological origin (see the discussion in pT]'). 

A single-component or intrinsic-mode type (IMT) function has the following 
form: 

F{t) = A{t) cos(27r(/3(t)), 

where the amplitude modulation A{t) and the phase function ip{t) are both reason¬ 
ably smooth; in addition, both A{t) and the derivative (or the “instantaneous 
frequency”) are strictly positive at all time as well as bounded; finally, we assume 
that A and ip' vary in time at rates that are slow compared to the instantaneous 
frequency of F itself. For the precise mathematical formulation of these conditions 
we refer to the ESM; this precise formulation invokes a few parameters, one of 
which, e, bounds the ratio of the rate of change of A and p'. This parameter will 
play a role in our estimates below. [Although we are assuming the signal to be 
real-valued here, all this can easily be adapted to the complex case by replacing the 
cosine with the corresponding complex exponential; the discussion in the remain¬ 
der of this section can be adapted similarly.] We also consider signals that contain 
several IMT components, that is, functions of the type 

L L 

(3) G{t)=Y^F,{t)=Y^ Ai{t) cos{27T(pi{t)), 

where each Fi is an IMT function, and we assume in addition that the instan¬ 
taneous frequencies p'^it) are ordered (higher i corresponding to larger p'^) and 
well-separated, 

(4) - p'eit) > d(v5^+i(t) + p'eit)) 

for all £ = 1 ,..., L — 1, for some d with 0 < d < 1. We denote by A the set of all 
such functions G; it provides a flexible adaptive harmonic model space for a wide 
class of signals of interest. (Strictly speaking, they are not “truly” harmonic, if 
harmonicity is interpreted - as it often is - as “having components with frequencies 
that are integer multiples of a fundamental frequency”.) 

Next, we turn to the noise model for which we prove our main theoretical result. 
For the purposes of this theoretical discussion, we use a simple additive Gaussian 
white noise (even though, as illustrated by the figures in the introduction, the 
approach works for much more challenging noise models as well!). That is, we 
consider our noisy signals to be of the form 

L L 

(5) Y{t) = G{t) + (T$(t) = '^Fg{t) -f cr$(t) = A£{t) cos{2np({t)) -j- a^{t), 

i=i 

where G = A, $ is a Gaussian white noise with standard deviation 1 

and CT > 0 is the noise level. Note that typically T is a generalized random process. 
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since by definition G is a tempered distribution. We could extend this, introducing 
also the trend and a more general noise model as in [^, the wave-shape function 
used in [M], or the generalized IMT functions that model oscillatory signals with 
fast varying instantaneous frequency of |31j . None of these generalizations would 
significantly affect the mathematical analysis, but to simplify the discussion, we 
restrict ourselves to the model (S.4). 

Finally, we describe the wavelets ipi,... ,’ipj with respect to which we compute 
the CWT of Y. For the sake of convenience of the theoretical analysis, we as¬ 
sume that they are smooth functions with fast decay, and that their Fourier trans¬ 


forms il)j are all real functions with compact support, supp^j C [1 — Aj, 1 -|- Aj], 
where 0 < Aj < 1. We also assume that the tpi,... ,tpj form an orthonormal 
set, that is, / 'ipi{x)ipj{x)dx = Sij, where 6ij is the Kronecker delta. To build 
appropriate linear combinations of the we dehne, for any unit-norm vector 
r = (ri,...,rj) in the corresponding combination as := ^ 

is convenient to characterize intervals for the scale a such that the support of 

overlaps where := (^)j we thus introduce the nota¬ 
tion = [(1 — Aj)/ip'f(b), (1 -I- Aj)/ip'^(b)]. It then follows from the definition 

of the CWT as the inner product between the signal and scaled, translated versions 
of the wavelets that (see m^) 


Wf/\a,b) 


6 ) -h ej(a, b) when a € z[^\b) 
tj{a,h) otherwise. 


where 


( 6 ) Qjj{a,b) = Ai{b)^/a'^pj{alp'^{b)) £ R 

and ej is of order e for all j = 1,..., J. Here Cj depends on the first three absolute 
moments of tpj and ip'j and the model parameters. It follows that the wavelet 
transform of Y ^ with respect to '0^, is given by 

L 

W^^^\a,b) = +e,(a,6 ) + 

where Xz^^\b) indicator function of the set z[^\h)] note that the ej{a,b)- 

term, again of order e, need not be the same as before. As shorthand notations, 
we will use bold symbols to regroup quantities indexed hy j = 1,... ,J into one 
J-dimensional vector, e.g. e{a,b) = [ei(a, 6), ..., ej(a, 6)]''' (which has norm of 
order e), $(a, 6) = ..., (a complex Gaussian random vector 

|22) . with mean [0 ,... ,0]''' S and covariance as well as relation matrix equal 
to /jxj - see ESM), W^{a,b) = &),..., ^^■'^(a, &)]t, and Qe{a,b) := 

lQi,eia, b),..., Qj/{a, &)]'''. Finally, '^(a, b) := r'^lVyia, b) or, more explicitly, 

L J 

(7) Wy' '\a,b) = [e(a,6) -f cr^{a,b)]. 

t=\ 3=1 ‘ 
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Under the general assumptions for our model, 


—idhW. 


(V-M) 


,/ L 


Y 


{a,b) = 2 tt I 

j=i e=i 


i2'mpj (b) 




e(a, b) + tT$(a, b) 


where e(a,b) is again a J-dimensional vector of order e, and $(a, &) is again a 
complex Gaussian random vector. The scalar products r'^^{a,b) and r'^^{a,b) 
are independent complex Gaussian random variables, with mean 0 and variance 
IkiP) J2j=i ’"j respectively. (See ESM.) Set now Zi{b) = (5). 

Then it follows that for a £ Zg{b), we get the following reassignment for the CWT 


tu, 




-idbW^'^\a,b) 

A ia,b) = _.. 


(5)g*2,r<pdb)Q^(a^ 6) + ?(a, b) + CT$(a, b) 


(V-M) 


'Y {a,b) 


rT b) + e(a, b) + cr$(a, 5)] 


2TrW^ 

which is a ratio random variable of two dependent complex Gaussian random vari¬ 
ables with non-zero means. Next, we consider, for each fixed realization of the 
random noise, the unit-norm vector r £ as a random vector, picked uniformly 
from S'« := {r £ ; r^W^^\a,b) > 2k and 3? (a, 6)) > o} C 

Restricting the choice of r to the subset of for which the inner product of r 

and 'W'^{a^ b) has magnitude larger than 2k reflects the threshold used in the SST 
algorithm (see Section]^; restricting r so that the inner product has positive real 
part means that we sample r from a half sphere rather than the whole sphere. (See 
ESM for more details.) 

Assuming that the bound on the noise is such that ||e -|- (J^\W < k, then the 
expectation of b) over S^, is given by 

E,4^’-)(a, b) = A{b) + (V,(a, b)) + E,, 

where Vi{a,b) :=e{a,b) + a^{a,b) — </3£(6)[e(a, 6) + a^{a,b)], denotes “taking 
the component along” a vector v, that is, P.„(m) = and Ei is bounded by 

1/2 




1 


1 - 


J-l 


\pQi{a,b)iyi{a,b))\ +c 


\\Vi{a,b)\\^^ 


J-l 


Furthermore the variance is bounded by 




(a, b)< ^ 


1 - 


J-l 


\pQe{a,b) {Ve{a,b)) 




J-l 


A detailed derivation, and an explicit expression for the constant c is given in the 
ESM; if J becomes large, we have c « 2-\/2/[K-\/7rJ]. The quantity |pQ^(a,6) iVi{a,b))\ = 

pQ,{a,b) b) + (T$(a, h) — (/)|(6)[e(a, h) + (T$(a, &)]^ , which occurs in several of 
these estimates, is, with high probability (with respect to the random noise pro¬ 
cess), fairly small for large J, because it is the norm of the projection onto Qiia, b) 
of Ve{a, b), and vectors that are unrelated (as is the case for Qi{a, b) and b)) 
have a higher chance of being close to orthogonal in higher dimensions. The other 

terms in (a, b) — ip'i{b) and in the variance Var^. lo^ ^(a, h) all have a factor 

J — 1 in the denominator. Our theoretical analysis thus proves that ConceFT, using 
a larger dimensional space of TF representations, and subsequently averaging over 
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the SST corresponding to random vectors in this larger dimensional space, leads 
to sharper estimates of the instantaneous frequencies for signals in A that are cor¬ 
rupted by noise. Even when J is not large, our bounds show that ConceFT leads 
to a reduction in potential deviation of the tvPS from the itvPS. 

The detailed estimates given in section ESM-3 are derived under the restric¬ 
tive conditions listed at the start of this section for the signals and the wavelets 
used. However, as noted above, these conditions can be relaxed significantly (at the 
price of more intricate estimates). In practice, we observe similar behavior in our 
numerical examples even for more complex situations; in particular, the method 
can handle noise models that are much more challenging, as illustrated in the next 
section as well as by Figure in section 

4. Numerical Experiments 

In this Section, we demonstrate the results of the ConceFT algorithm on exam¬ 
ples; we also discuss different choices for some of the different parameters involved. 
The ConceFT Matlab code and the codes leading to the figures in this paper could 
be found in https;//sites.google.com/site/hautiengwu/home/download. 

The first choice to be made, when applying CWT- or STFT-based ConceFT, 
concerns the family of orthonormal reference functions (wavelets or window func¬ 
tions) for the underlying wavelet or windowed Fourier transform. In both cases, 
we pick a family of eigenfunctions for a time-frequency localized operator designed 
for the CWT or STFT framework; as shown in [nmn] these can provide “opti¬ 
mal” localization within a restricted region of time-frequency space, where the size 
of the region depends solely on the number of functions used. More precisely, we 
use orthonormal Hermite functions for the STFT case [min] (see also Figure [^, 
and Morse wavelets for the CWT case HUSS]. In both cases, the shape of the 
localization domain in TF plane is not completely fixed, but can be adjusted by 
varying some parameters; for details, see FSM. Once the family of orthonormal 
reference functions is fixed, we need to decide how many ipj, j = 1,J we pick; 
this corresponds to choosing the size of the corresponding domain of concentration 
in the TF plane. Flexibility in the choices of shape and size of the TF localization 
domain make it possible to adapt ConceFT, to some extent, to the family of signals 
under consideration. Finally, ConceFT also depends on the number N of random 
projections chosen (see sections and |^ . In principle, the larger N, the closer the 
results are to the expected value of the random process, and the more we expect 
accidental correlations between reference function and the noise to cancel out in 
regions of the TF plane where the signal does not reside; in practice, increasing N 
beyond a certain value does not appreciably improve the results. In what follows, 
we explore these different choices for the CWT case, on a simple family of challeng¬ 
ing examples, with noise of different types (white Gaussian, Poisson and ARMA), 
and of different strengths. Results for the STFT case are similar; we will come back 
to them briefly below in subsection as well as (in more detail) in the FSM. 

For our test data, we restrict ourselves to simulated signals only, so as to be 
able to quantify the deviation from the “ground truth”, usually not available in 
real-life applications. (ConceFT results on concrete signals will appear elsewhere 
|41).l On the other hand, we want to avoid parametric models, so as to be suf¬ 
ficiently general. Accordingly, we generate a class C of non-stationary data via a 
random process described below in subsection each realization provides us not 
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only with a (simulated) clean signal, but also with the exact “ground truth” for 
the time-dependent instantaneous frequency and amplitude of the components of 
that signal. The same subsection also describes in detail three different noise mod¬ 
els (white Gaussian, Poisson and ARMA(1,1)) for which the approach is tested. 

After applying ConceFT to signals in C, we want to compare the ConceFT results 
with the optimal, ground truth TF representation; to quantify their (dis)similarity, 
we use an Optimal Transport (OT) distance, as described in subsection |4.2[ In 
subsection |4.3[ we discuss how choices of the parameters and of the number of or¬ 
thogonal Morse wavelets impact the ConceFT results, for this family of examples; 
subsection |4 . 4| illu strates the effect of the number N of random projections. Finally, 
in subsection |4.5[ we explore the effect on CWT-based ConceFT of different noise 
levels, for each of the three noise types we consider; subsection briefly discusses 
the STFT case. 

4.1. Data simulation. To generate a typical multi-component signal, we use smoothened 
Brownian path realizations to model the non-constant amplitudes and the instan¬ 
taneous frequencies of the components; more precisely, if W is the standard Brow¬ 
nian motion defined on [0,oo), then we define the smoothened Brownian motion 
with bandwidth B > 0 as ^W-kKB, where Kb is the Gaussian function with 
standard deviation B > 0 and ★ denotes the convolution operator. Given T > 0 and 
parameters Cij • ■ • Ce > 0, we then define the following family of random processes 
on [0,T]: 



For the amplitude A{t) of each IMT, we set (2 = (s = 0; every realization then 
varies smoothly between and (ji -I- Ca- In ths examples shown below and in 
the ESM, the signal consists of two components (i.e. L = 2) on [0,60]; their two 
amplitudes are independent realizations of 'I'j 2 _o.i, 20 o,o,o(I^)]- To simulate a phase 
function, we set Ci = Ca = 0; 'I'[o.C 2 ,o,o,C 6 ,C 6 ](^) I® then, appropriately, a monotoni- 
cally increasing process. In the examples we consider, we take for (pi(t) a realization 
of [ 0 . 10 , 0 , 0 . 6 , 400 ](t) for t e [0,60], and for (p 2 (t) a realization of ^'[o, 277 . 0 . 0 . 2 ,aoo](0- 
Finally, we also constrain each component to “live” on only part of the interval, by 
setting 

s(t) = Ai(t) cos(27r(pi(t))x[i8, 60 ] (i) +^ 2 (f) cos(27r(/?2(f))X[o. ae] (f) =: si(t) +S 2 (t) , 

where is the indicator function of [ti,T 2 ]; that is, X[Ti,r 2 ](t) = 1 it ti ^ t < T 2 , 

X[Ti,T 2 ](t) = 0 otherwise. We shall denote the resulting class of two-component 
signals by C. In our examples, signals in C are sampled uniformly at rate I60Hz, 
corresponding to 9600 samples. Figure plots s(t) for one example s S C, as well 
as the instantaneous frequencies (IFs) of its two components, all restricted to the 
subinterval [15,40] C [0,60]. 

Note that the signal s should not be viewed as a random process itself - we 
use the random processes as a means to generate signals consisting of 

several components for which the amplitudes and instantaneous frequencies are not 
easily expressed analytically, but we will not consider or compute expectations with 
respect to these processes - once s G C is generated, we consider it fixed when we 
apply ConceFT to it. (In further subsections, we shall encounter other elements of 


C.) 
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Figure 3. The signal s (in black) and the corresponding instan¬ 
taneous frequencies (in gray) of the two components, restricted to 
the time interval [15,40]. 


To study the performance of ConceFT in the presence of noise, we add noise to 
s(t), setting Yitj.) = s{tk) + where tk is the fc-th sampling time and ^ is a 

stationary random process. We shall consider three different noise models; in each 
case we set the value of a so that the signal to noise ratio (SNR), 

equals 0 dB. The three noise models we consider are Gaussian white noise, an auto- 
regressive-and-moving-average (ARMA) noise and Poisson noise. For the ARMA 
case, we consider an ARMA(1,1) model determined by autoregression polynomial 
a{z) = 0.5z -I- 1 and moving averaging polynomial h{z) = —0.5z -I- 1; for the inno¬ 
vation process we use independent and identically distributed Student t 4 random 
variables. [Note that this ARMA(1,1) noise is not white, because of the time depen¬ 
dence; in addition, the Student t 4 random variable has a “fat-tailed” distribution, 
resulting in possibly spiky realizations.] For the Poisson noise, we pick the ^(tfc) 
to be independent and identically sampled from the Poisson distribution with pa¬ 
rameter A = 1. Figure]^ plots a realization of Y[t) = s(t) -\- cr^(t) for each of these 
three noise processes, restricted to the subinterval [15,40]. 


Clean 

Gaussian 


ARMA(1,1) 


Poisson 

15 20 25 30 35 40 

Figure 4. The restrictions to [15,40] of the clean signal s (top) 
and of the noisy signal T = s -I- where the added noise is 
Gaussian, ARMA(1,1), or Poisson noise (2nd row to bottom, in 
order); in each case a is picked so that the noisy signal has 0 dB 
SNR (signal to noise ratio). All signals are plotted at the same 
scale. 
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4.2. Performance evaluation. To evaluate the performance of ConceFT, we pro¬ 
pose comparing the time-varying power spectrum or tvPS (defined at the end of 
Section of the results of the ConceFT analysis of Y with the ideal time-varying 
power spectrum (itvPS) of our simulated signal s, which can easily be defined ex¬ 
plicitly (because our construction was designed accordingly) as follows: 

2 

k=l 

In order to quantify the (dis)similarity between the ConceFT-estimated tvPS Py 
and the itvPS Pg, we use the Optimal Transport distance (also called the Earth 
Mover distance). Because the principle of ConceFT is to “reassign” content in 
the TF plane, keeping the time-variable fixed (see Section 0, we also keep t fixed 
for the OT-distance. That is, we interpret, at each time t, Py(t,a;) and Pg(t, w) as 
(probability) distributions in uj and compute the OT-distance between them, which 
essentially measures how much one distribution needs to be “deformed” in order to 
coincide with the other; this is repeated for all t, and the average of the t-dependent 
individual OT-distances then indicates the quality of the estimator Py for Pg. 

The precise definition of (the discretized version of) the OT-distance we use is 
given in the ESM; Figure [^displays 4 examples in which two delta-measures local¬ 
ized on curves in the TF-plane (similar to the itvPS defined above) lie at similar 
OT-distances of each other - although in each example the distance indicates a 
different type of “distortion”. Together, these examples give an intuitive under¬ 
standing of the way in which OT distances capture the difference between the TF 
distributions of interest to us here. 

4.3. Parameter selection. As described in [JHl: generalizing the construction in 
orthonormal families of Morse wavelets can be defined for different values of two 

parameters, /3 and 7 ; different choices correspond to different shapes of the domain 
in the TF-plane on which they are mostly localized (see ESM). Once the values of /? 
and 7 are chosen, determining the family of ipj, one also needs to select J, the total 
number of orthonormal reference wavelets used in the ConceET method. For signals 
in C (see0, we explored systematically a range of (/ 3 , 7 ) pairs, as well as different 
values of J, to find the choice that, under different types of noise, with SNR of 0 
dB, gave rise to the smallest OT-based distance (as described above) between the 
itvPS and the ConceFT-estimated tvPS. Surprisingly, the optimal choice depended 
very little on the type of noise; the optimal values we found are /3 = 30, 7 = 9 and 
J = 2. (Detailed results are given in the ESM.) 

4.4. Effect of the number of random projections. The ConceFT Algorithm 
averages the SST results computed with N randomly picked reference wavelets (or 
windows, for STFT) from the linear span of the tpj, j = 1,J. It is expected 
that the concentration in the TF-plane observed with ConceFT kicks in only when 
N is sufficiently large; on the other hand, the larger N, the more expensive the 
computation. To explore the trade-off, we applied ConceFT to the three noisy 
versions of the signal s € C (see subsection 0, with N ranging from 1 to 200. 
In all cases, the ConceFT algorithm uses the optimal parameters as described in 
subsection |4.3[ i.e. it uses the first 2 Morse wavelets with parameters /I = 30, 7 = 9. 
In this simulation, each ConceFT computation was repeated 300 times and the 
mean and standard deviation of the OT distances of the ConceFT-tvPS to the 
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Figure 5. Top row: left: the TF-localization of the ideal time 
varying power spectrum (itvPS) of a 2-component simulated sig¬ 
nal Sa (not showing the amplitude modulation (AM)). All other 
itvPS shown in the top row are for signals S(,, Sc and Sd that have 
a fairly small different OT-distance with respect to Sa; the two 
components have the same time-dependent amplitudes as for Sa, 
but the instantaneous frequency curves have been moved (in or¬ 
der from left to right) by a narrow bump (left), a random dither 
(middle) and a shift (right). Bottom row: illustration of amplitude 
change: left: the original itvPS of Sa with the AM values indicated 
by gray scale level; right: an itvPS example with the same IF but 
different AM. In all the figures, the horizontal axis is time and the 
vertical axis is frequency. The image for each “deformed” itvPS in¬ 
dicates its OT-distance to the original itvPS (shown in the leftmost 
image on each row). 

itvPS were computed. Figure]^ plots the results. For each of the three noise types, 
the graph of the average OT-distance shows an “elbow” shape, i.e. a regime in 
which the decrease is faster, as N increases, followed by one in which the decrease 
is less marked. The elbow is located around N = 20; the standard deviation is 
also quite small for this N. We accordingly decided to set A^ = 20 in our further 
experiments. 

4.5. ConceFT results for noisy signals. We now show the result of using 
ConceFT with the calibrated parameter choices. We illustrate the performance 
of ConceFT on signals of the simulation class C (see subsection]^, for a range of 
SNR, as well as on deterministic signals. 

As a warm-up, we start with the signal s seen before. The top row of Figure]^ 
plots the tvPS Py- of the three noisy versions of s next to the itvPS P^. To compress 
the dynamical range of the tvPS plots, we carry out the following procedure. We 
first normalize the discretized version Py G of Py (where m and n stand 

for the number of discrete frequencies and the number of time samples, respec¬ 
tively) by multiplying it by a constant so that the total weight of all entries equals 
the same number for all cases - i.e., for some 0 > 0 (to be picked - see below). 











Frequency (Hz) Frequency (Flz) 
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Figure 6. The OT distance as a function of the number N of 
random projections. The shaded band indicates the standard de¬ 
viation of the OT distance at the corresponding number of random 
projections. The left column is for the first example and the right 
column for the second. From left to right, the noise types are 
Gaussian, ARMA(1,1) and Poisson respectively. For all three ex¬ 
periments, P = 30, 7 = 9, and the first two Morse wavelets are 
used. 


20 

15 
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Figure 7. First row: results for the signal s; second row: re¬ 
sults for a new example s*. Left to right: ideal time-varying 
TF power spectrum (itvPS) for the clean signal, followed by re¬ 
sults of ConceFT with Morse wavelets after (in order) Gaussian, 
ARMA(1,1) or Poisson noise was added, with SNR of 0 dB. Clearly, 
even for a signal-to-noise ratio is as low as 0 dB, the results approx¬ 
imate the truth with high precision. For each of the tvPS panels, 
the header gives the OT distance to the corresponding itvPS. 


^J2T=iJ2?=i ~ ^ gray-scale visualization of i? G 

rather than the (normalized) Py G itself, where Rk^i := log(l-|-min{Pfc_/, g}), 

k = l,...,m, I = 1 ,... ,n and g is a (very high) cut-off to downplay the effect of 
far-off outliers. We choose q to be the same for all three tvPS, so that comparable 
gray levels on the different tvPS panels indicate comparable values of R (see Section 
4f in the ESM for a more extensive discussion of choosing q and gray-scale plotting 
of tvPS). For the figures, we choose 0 = 5 and q = 5.718; this value for q is the 
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minimum of the 99.8% quantiles of the different tvPSs. The second row of Figure 
gives the results for s*, a signal of the simulation class C that was not used (in 
contrast to s) to calibrate parameters of ConceFT. The results are similarly highly 
accurate. 

Next, we study the effect on the ConceFT performance of the noise level, as 
quantified by SNR. To this end, we revisit the analysis of the signal s* (and s in the 
ESM). For each signal, each type of noise (Gaussian, ARMA(1,1) or Poisson) and 
each SNR considered (SNR= x dB, where x G {—7, — 6 ,..., 6 , 7}), we considered 
20 independent realizations of the noise process; for each of the resulting noisy 
signals we carried out the ConceFT analysis and computed the OT-distance of 
the tvPS to the itvPS of the clean signal; we then computed the mean and the 
standard deviation for each. The results are shown in Figure The same figure 
also compares the ConceFT results with those of simple SST (using either the first 
Morse wavelet with parameters j3 = 30, 7 = 9 as reference wavelet, or one random 
linear combination of the two first Morse wavelets) and of multi-taper SST (denoted 
as orgMT), using the same ipj as ConceFT. For each of these alternate methods, 
we likewise computed the mean OT-distance of the tvPS to the itvPS for 20 noise 
realizations. It is striking that the ConceFT method outperforms the other methods 
in all cases. 



Figure 8 . OT distance of ConceFT tvPS results against signal to 
noise ratio (SNR) of the signal s*{t), and comparison with stan¬ 
dard SST and standard multi-taper SST (see text). Noise type 
(left to right): Gaussian, ARMA(1,1), and Poisson. The standard 
deviation is smaller, at the scale of this figure, than the height of 
the markers, and has not been plotted. 

Finally, to address possible concerns that the randomness in the generation and 
plots of and A{t) somehow “help” ConceFT in these estimations, we show 
in Figure the results for yet another signal, which (in contrast to s and s*) is 
completely deterministic; it consists of 3 components, each given by an explicit, 
analytic formula (again for t G [0,60]): 

s°(t) =X[io. 48 ](^) (1 + 0.3cos(7r(t — 10)/20)^) cos ( 71/3 -I- 5< -I- t^/50) 

-I- ( 0.4 -I- 0.9sin(7rt/60)^) cos (12t -|- sin(7rt/6)) -I- cos (lit + {t — 35)^/800) 

Figure]^ shows that the results are of a quality similar to those in Figure]^ 

4.6. ConceFT with STFT. As described earlier, the ConceFT approach can be 
carried out for STFT-based SST as well as for CWT-based SST. Figure[^in section[2 
already showed the results of STFT-ConceFT on one example. Other examples are 
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Figure 9. Results for the three-component deterministic signal 
s°. Left: ideal time-varying TF power spectrum (itvPS) for the 
clean signal, followed by results of ConceFT with Morse wavelets 
after (in order) Gaussian, ARMA(1,1) or Poisson noise was added, 
with SNR of 0 dB. 


shown in the ESM, together with values of the OT-distance of the STFT-ConceFT 
estimated tvPS to the itvPS. In these experiments, as in Figure the reference 
windows are chosen to be Hermite functions; 20 random projections are used to 
compute the ConceFT averages. Although STFT-based ConceFT achieves better 
OT-distance with respect to the ground truth than STFT-based multi-taper SST, 
and also achieves a better reduction of “background noise” (i.e. the structures in 
zones away from itvPS concentration, due to fortuitous correlations between the 
noise and the overcomplete frame of TF reference functions; see the description in 
section [^, the performance of STFT-based ConceFT is not quite as impressive, on 
the class C, as CWT-based ConceFT. We provide some discussion in the ESM. 


5. Conclusion 

We consider signals that are the linear combination of a small number of “intrinsic¬ 
mode functions”, each of which can be reasonably viewed as an oscillatory function 
with well-defined but time-varying amplitude and “instantaneous frequency”. We 
have introduced a new approach, called ConceFT, to determine the time-frequency 
representation of such signals, combining multi-taper estimation ideas and aver¬ 
aging over random projections with synchrosqueezing. Theoretical analysis shows 
that this leads to improved estimation of the time-varying characteristics of the 
signals of interest; numerical results confirm the theoretical promise, even when the 
signals are corrupted by significant and challenging noise. 

We also introduced two tools to evaluate the effectiveness of this method (or 
other similar methods), which may be of interest in their own right to others work¬ 
ing in the TF field. On the one hand, we introduced a class of explicit, easy to 
construct signals with explicit time-varying characteristics, even though the signals 
themselves are not given by explicit formulas; the explicit time-varying amplitude 
and instantaneous frequency give a “ground truth” with which estimations can be 
compared. On the other hand, we introduced a distance between time-frequency 
representations that can be useful in comparing results obtained by different meth¬ 
ods, by computing for each the distance to the “ground truth” time-frequency 
representation. 
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Electronic Supplementary Materials for 
“ConceFT: Concentration of frequency and time 
via a multi-tapered synchrosqueezing transform” 

ESM-1. Introduction 

These are the Electronic Supplementary Materials for the paper ConceFT: 
Concentration of frequency and time via a multi-tapered synchrosqueez¬ 
ing transform. They contain, in particular, precise mathematical definitions, 
theorem statements and proofs that complement the more general exposition in 
the main body of the paper, as well as details about the numerical examples and 
additional examples. For the convenience of the reader, the organization into sec¬ 
tions follows that of the paper; for instance, material in Section ESM-3 complements 
Section 3 in the main paper. 


ESM-2. The ConceFT Algorithm: Several Remarks 


(A) As described in Section 2 in the main paper, the SST-steps in ConceFT 
involve the computation of a partial derivative, dbWj'^\a,b) with respect to the 
localization parameter b of wj'^\a,b). In practice, one has W^j^\a,b) only for 
discrete (as opposed to continuous) values of a and &, and partial differentiation 
would be approximated by a differentiating scheme. This can cause stability issues 
when / is noisy. Using the definition of IF|’^^(o, 5) as the inner product of / with 

one can compute dbWf{a,b) via the wavelet transform of / with 

respect to the wavelet i/f' using dbWj'^\a,b) = —W'j^ \a,b), typically this makes 
the computation more stable than simple numerical differencing. 

(B) The ConceFT algorithm consists in taking the average of many nonlinear 

SST estimates of the tvPS, each of which results from a wavelet transform with 
respect to a randomly picked reference wavelet; for each individual transform the 
corresponding reassignment is computed and carried out to hnd that individual 
SST. An alternative to the individual SSTs would be to define one “master” re¬ 
assignment rule, as follows. From the collection of j = 1,..., J, we could 

estimate b) as the value of ^ for which the vector 

w{a,b) = [W^/\a,b),dbW^/\...,Wy\a,b),dbWy\a,b)]eC^'^ 
is most “aligned” with the vector u{a,b,^) 


uia,b,0 = [W^JU{a,b),dbW%,,.. .,W%,{a,b),dbW%,{a,b)] e 


In other words, the reassignment rule would become 


U(a, b) := argmax 


\{w{a,b),u{a,b,^))\ 
||uj(a,6)|l|lM(a, 6,011 ' 


Although numerical experiments have shown this to be an interesting approach as 
well, we shall not pursue this in this paper. 

(C) In most of our examples and figures, we concentrate on visualizing the loca¬ 
tion in the TF plane of the curves characterizing the different IMT components of 
the signals considered. However, we can also use the tvPS constructed by ConceFT 
to estimate the different amplitudes, as follows. Each ConceFT tvPS is the average 
of many SSTs constructed in such a way that the integral (sum, in practice) over 
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on an interval around (j)'i{t), approximates Ai{t) cos(27r(/)/(t)) (see [Ml[8]). It follows 
that one can use the ConceFT representation to first identify the for all t 

(which can be done more stably with ConceFT, for large noise, than with simple 
SST of MTSST), and then integrate with respect to ^ in an appropriate 

interval around to recover Ai{t). 

ESM-3. Theoretical Results: Mathematical statements and proofs 

The following is the mathematically precise definition of an intrinsic-mode type 
(IMT) function: 

Definition S.l. Given e, ci and C 2 satisfying 0 < e ^ 1, 0 < ci < C 2 < oo, a 
function F{t) is said to be of type if it can be written as 

F{t) = A{t) cos(27r(/3(t)), 

where 

' AGC\m.)nL°°(R), 
inftgK A{t) > Cl, inftgR (p'{t) > Ci, 

' SUPtgR A(t) < C2, SUPigR(/7'(t) < C2, 

|^^(i)| < for all < S K, 

To model the oscillatory functions with different oscillatory modes, we also con¬ 
sider superpositions of IMT functions: 

Definition S.2. Given e, ci, C 2 and d satisfying 0 < e ^ I, 0 < ci < C 2 < oo, 
0 < d < 1, a function G{t) is said to be of type if it can be written as 

L L 

(5.2) G(t)=^F,(t)^ Ai{t) cos{2Tr(pi{t)), 

e=i i=i 

where each Fi = At{-) cos(27r(p^(-)) is of type 

(5.3) > d(v5«+i(0 + v'Ai)) 

for all £ = I,..., L — I. 

Finally, we also consider the additive white Gaussian noise. Denote S to be the 
Schwartz function space. Our model for the observed signal Y (t) is thus 

L L 

(5.4) Y{t)=J2 Fi{t) tT$(t) = ^ Ag{t) cos{2TT(fe{t)) + cr$(t), 

e=i e=i 

where G = is of type , <& is a Gaussian white noise so that the 

standard deviation of 'it (ip) is 1 for all i/: G 5 with norm 1, and cr > 0 is the noise 
level; F is a generalized random process, since by definition Ai{t) cos{2nipg{t)) 
is a tempered distribution. 

The J reference wavelets 'ipi,... jtpj are orthonormal, that is, J 'ipi{x)'ipj{x)(lx = 
Sij, where Sij is the Kronecker delta. For simplicity we assume that the tpj all 
have fast decay, that their Fourier transforms ipj are real functions with compact 
support, and suppi/:^ G [1 — ? I where 0 G Aj < 1. 
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We shall consider the continuous wavelet transforms of Y with respect to the 
ipj, and apply synchrosqueezing to them. For the F^-components of Y we refer 
the reader to the detailed analysis in mm- In particular, we introduce the sets 


Z^/\b) = 


l-A. 1+A. 


. V'lib) ’ VW _ 


If each Aj satisfies Aj < 


for j = I,..., J (which 


we shall assume for the remainder of this discussion), then one finds that, by the 
conditions on G, the sets z[^\h) are disjoint. Moreover, the CWT W^^\a,b) is 
small except for those pairs (a, 6) where a S z[^\b) for some £, and in that case 

-idbW^^\a,b) 


2'j7W^^\a,b) 


is close to <^'i{b). (See Theorem 3.3 in [13].) 


It will be convenient to use A := min^^^(Aj), A := max^^;^(Aj), and Zt{b) = 


l-A l+A 
v'^<,b) > <p'^(b) 


. Clearly Zt{b) = r\-l^^z[^\h). 


As shown by the analysis in we have 


W^^^\a,b) = 


b) + ej{a, b) 

ej{a,b) 


when a € Z^^\b) for some £ = 1,..., 
otherwise. 


where 

(S.5) 


Qj^e{a,b) = Ai{b)^/aiij{aip'i(b)) e K 

and ej(a, b) is of order Z = for all j = 1,..., J. 

Adding also the noise, we have thus 

L 

b) = Y, &)x^o) (&) + e,(a, b) + iTd>(V>f 

where ■= -^'4’j (^) and Xz^-^^b) indicator function of the set 

Z^ib). 

To simplify further notation, we shall use boldface for J-dimensional “vector” 
quantities; for instance we denote '0 := [0i, 02 , • ■ •, 0j]''’ € 0*^5 and := 
where r S S‘^~^ = {u G ; ||i;|p = = !}■ Clearly 0M is 

also a Schwartz function, with supp C [1 — A, 1 + A]. We similarly intro¬ 

duce e(a, b) := [ei(a, 6),..., ej(a, 6)]’’’ (a vector with norm of order e), (5.g(a, b) := 
[Qi,e{a, &),■■■, Qj,e{a, 6)]'''. Note that all the entries of the vectors Qf(a, b) are real; 
this will be important for our estimates below. 4>(a, b) := ..., $(0j“’^^)]‘''. 

#(a, &) is a complex Gaussian random vector |22] . of which the following Lemma 
gives some basic properties: 

Lemma S.3. For all a > 0 and 6 G M, 4>(a, 6) is a complex Gaussian random 
vector with mean [0,..., O]’’’ G K'^, for which the covariance matrix and the relation 
matrix both equal Ijxj- Thus, for all v G K'^, v'^^{a,b) is a complex Gaussian 
random variable with mean 0 and variance Ill’ll^. 

Proof. Fix a > 0 and 6 G ffi. Since $ is a Gaussian white noise and 0 is a complex 
Schwartz function, it follows that for j = 1,..., J, <I>(0j“’^^) is a complex Gaussian 
random variable [23] . By definition, its mean is 0 and its variance is 


(S.6) Var($(0j“’'’^)) =E|$(05“’^^)P = J 


0(-o)(^) 


— ll^illi2(R) — 1- 
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It is clear that the variance of ) is independent of the scale a. Since and ijjj 

are orthogonal ifi ^ j, a similar calculation shows that Cov(<i)('(/;j-“’^^), = 

Sij. Since we assume that ipj is real for all j = 1,..., J, the relation matrix 
of $(a, 6) equals the covariance matrix, and is thus Ijxj as well. It then easily 
follows that v'^^(a^b) is a complex Gaussian random variable with mean 0 and 
variance ||v|p. □ 


Because the CWT is (anti)linear in the wavelet with respect to which it is com¬ 
puted, the CWT of Y with respect to ^[’’1 is given by 
(S.7) 

L J 

'\a,b) = + rT [e{a,b) + a^{a,b)] . 

£=1 j=l 


The analysis in di also shows that 


d b) = I &)-|-ej(a,6)) when a G (b) for some £ 

^ ' y z27r?j(a, &) otherwise, 

where ej(a, b) is of order e for all j = 1,..., J. We thus obtain 


—idbW^ ^\a,b) = 27r EE b)xz(/){a, b)2Trr'^ [^e(a, b) + cr$(a, b) 

1=1j=i 

where?(a, b) = [?i(a, 6),...,?j(a, b)Y, and $(a, b) = (27r)-i[$(z(?/’i“’*'^)'), ■ • •, 

Here e{a,b) is again a J-dim random vector with norm of order e. The following 
lemma gives some basic properties of the complex random vector $(a, b): 


L R 


Lemma S.4. For all a > 0 and 6 G M, 4>(a, 6) is a complex Gaussian random 
vector with mean [0, ...,0]'’’ G for which the covariance matrix and the re¬ 
lation matrix both equal diag[||V'lP, • ■ •, V'jlP]/(27ra)^ G Thus, for all 

V G 6) is a complex Gaussian random variable with mean 0 and vari¬ 

ance E/=i II V'i 117(2™)^- 


Proof. The proof is the same as that of Lemma [S.3[ except for the following slight 
difference: 


Var(cl>(*(V>f’'))') = = IKV’f’'^)'II l^(b) = IIV'' lli^(B)/a" 

As a result, when a G Zi{b), the reassignment rule, lo^ \a, b), becomes 

-idbW^'^\a,b)+ 2TTr'^a^{a,b) 

Wy (a, 0 ) — - 


□ 


27r '^(a, b) -|- arT$(a, 6)^ 

+ e{a,b)+a^{a,b) 


rT 


b) + e(a, b) + a^{a, &)] 


It follows from Lemma 


S.3 


and Lemma 


S.4 


that uj^ ^ (a, b) is a ratio random 


variable of two independent complex Gaussian random variables with non-zero 


means. 
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Note that we are implicitly assuming here that the denominator in the fraction 
for uj^ is not too small (see Section 2 in the main paper). In what follows, 

we shall make this explicit: we shall always assume that 


27T\wlf''''\a,b)\ ( = 


b) + e(a, b) + a^(a, b) 


if a € Zi{b) ^ 


exceeds the value 2k, where the value of k can be set (according to the signal 
characteristics and noise level). At the same time, we shall assume that e and a 
are sufficiently small that 


E (||e(a,5) + cr$(a,5)|p) < . 


(This means that the threshold for the reassignment rule must be set in accordance 
with the rate of change of the amplitudes and the instantaneous frequencies of the 
individual constituent components in the clean signal, as well as with the level of 
the noise - both eminently reasonable restrictions.) We shall see below how these 
restrictions will come into play. 

Let us first prove some technical Lemmas. 


Lemma S.5. Fix J € N and k > 0. Denote := {r G S'^ ^; r'^v > k}. For 
u G C'’ and v G we have 


(S.8) 


1^. 


rT V 


dr = p^u p^'^u + 


where is the real part of u, Ait is the imaginary part of u and p„(m) is the 
component of the vector w along the direction of v, pv{w) := Further¬ 

more 


(S.9) 


1 


f 

r'^u 

Is. 

riv 


dr = -I- c 


wn^wi 

j-1 


where is the projection operator onto the subspace perpendicular to v and 


2r((J-l)/2) (l-a:2)('^-i)/2 2^2 

V^r(J/2) A 7^' 


Proof. Without loss of generality, we can assume that u G M.'p and |jr|| = 1. We 
can find TZ G SO{J) so that TZv = ei, where ei := [1,0,..., 0]''' € Under 

this change of variable, we write 

TZu = [di,d 2 ,..djY G 


where 


di = u'^v = pv{u). 


1 

f r'^u 1 



Js. I-5'kI 

Js. {T^rVTZv 

1 

r rT 

[di,d2,. ■ (ij]T 

•S'k 

J ri>K} 

rT ei 

1 

f ^ 


|5«| 

J ri>K} 

Qf , 
ri 


As a result, we have 
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which becomes di = p^u since, for J 1, ^ is an odd function of rj, and the 
domain of integration is invariant under sign reversal of rj. For the second part, 
note that by the same change of variable, we have 


(E/=i rjdj 












where the last equality holds because = 0 since 

in each term with i ^ j, there is at least one index different from 1, so that 
this term changes sign when it is mirrored with respect to that index, while the 
domain of integration is invariant under this mirroring operation. For the other 
terms, note that when j = 2,... ,p, symmetry arguments imply that 

— / 

|<5'k| d{reS-2-i; ri>K} 

J 1 J{r^S'^~^-,ri>K} 

= —— / ^dr. 

To evaluate the last term, we use spherical coordinates in J dimensions. Rewrite 
r = [ri,..., rj]T G as 

ri = cos (pi 
r2 = sin(pi COSV32 


rj_i = sinvsi.. .sm(/?j_2COSV3j-i 
[ rj = sin (/?i .. .sin (pj _2 sin V3j_i, 

where pi,..., (pj -2 G [0, tt) and tpj-i G [0, 2'k). In this coordinate system, the vol¬ 
ume form becomes dr = (sin"^“^ (pi)(sin'^“^ (p 2 ) ■ ■ ■ (sin(/?j_ 2 ) d(/?j_i dipj _2 ■ ■ ■ d(/?i; 
we obtain thus 


d{r’eS''-i; ri>K} 

ofT r sin-''"Vi sin'^"^V52 • ■ • ,sin(pj _2 

= 2/ / ... / / -5-dv3j_id(pj_2... dv3i 

JU Jo Jo Jo cos"' (fii 

= 2\S^-^\ f 

cos"' ipi 
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where G [0,7r/2]; cos(/3i > k}. Similarly, we have 


/ dr = 2 15'“'-^! / sin^-^idv?!. 

By putting the above together, we have finished the claim since 

|2 


where 

(S.IO) 


C = 


215'-^ 


1 

-21 


f 

r'^u 

's^ 

rT V 


dr = d? + c + c 


sin'^ ^ tpi 


2 r((J-l)/ 2 ) /■ sin^^ 


I'S'kI Ji^ Vcos2(^i y 0rr(J/2) Jj^cos^ip 

Notice that the Gamma function ratio de asymptotically approxi¬ 

mated by (J/2)“^/^ as J —)■ oo and that 


sin'^V^, /■^(l —i)/2 
- ^—dip = / -^- du = 


Jl^ cos" If u" 

It follows that c is approximately 

(S.ll) 


4(l + 0(u"))dR= -+0(1). 


2v^ 

VttJk 


□ 


We are now ready to study the statistical behavior of ui^ ^ (a, b) as the unit vec¬ 
tor r is picked randomly, uniformly in = {r G S'^~^ ; r^ 5 ) _l_ b) 

2k} C In the next proposition, we keep £, b and a fixed, on the understand¬ 

ing that a G Zi(b). To ease up on notation, we shall suppress {a,b) and i in the 


a^{a,b)) 


notation, and use uj^ \ Q, (p{b), c, S'^, etc, to denote ujy \a,b), Qg{a,b), 
ipe{b),e{a,b), $(a, 6), si^\ etc. 

Proposition S.6. Fix a realization of d), k > 0, 5 G K and a G Zi{b). Assume that 
r is sampled uniformly from S'„ = {r G S'^~^ ; r^ + e -|- (t^>) > 2k} C 

S-^-\ When \\e + a^\\l < K, we have 


[r] 


(r), 


(S.12) 




= p}'{b) + (? + - >p'{b)[e + a#]) + E^, 


where is the expectation of \a, b) as r is sampled randomly and uniformly 

from S'k, and Ei is bounded by 

(S.13) 


\Ei\< 


I 


1 - 


J- 1 


IPq (^e + cr$ - ip'ib)[£ + cr$]^ p + C - 


■ (J$ — Lp'{b)[£ + cr$]|j" 

J-1 


1/2 


Furthermore we have 
I - 


Var,4*'’'' < I 


c 

J-1 


IPq -I- cr$ - p>'ib)[e + cr4>]^ P + c 


|e -I- (T$ — ‘p'{b)[e -(- a< 

J-1 


where Var„ is the variance of uj 


Y 


over Sk- 


Before the proof, we have the following remark about the Proposition. 
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Remark. In the statement of this proposition, we encounter several times the ex¬ 
pression IPq V| (using the shorthand notation V = e + — (p'{b)[e -|- cr#]), which 

can be bounded by ||V||. In practice, however, the term IpqVI will likely be signif¬ 
icantly smaller than its norm, with high probability if J is large. Indeed, the vector 
Q is fixed, while the vector V is a random vector in J dimensions, depending on 
the random realization of the noise function $, which is much more likely than not 
to lie in a region near the equator, perpendicular to Q, since this region contributes 
the lion share of the sphere “area” (really a J — 1-dimensional volume), increasingly 
so as J increases. Denoting Ij := {ipi € [ 0 , 7 r/ 2 ]; 0 < cos(i^i) < 7 }, we have indeed 

|{r e 0 < ri < y}] 

I / ... (sin(^j_2) d(pj_2 ... d(pi 

0 ^0 

= \S^~‘^\ I sin^~^ (pid(pi. 

Ji^ 



Consequently, 

|{r e SJ-^; 0 < ri < 1 }| “ ipd^ ~ ~ (I - u'^yj-^y^du 

which approaches 1 as J increases to 00 . 


Proof, (of the Proposition.) To simplify the notation in the computation, we set 

A := (^'(6)e*2’^^WQ, a:=e + a^,B := = A/(p'{b), b :^e + a^. 

By the assumption that r is sampled uniformly from Sk, we have 


r _ A 
^ - \S, 


r'^ {Ap a) 1 

-ClT* = - 

rT {B + h) 


P'{b) 

1-5.1 


1 -P 


(S.14) =^'( 6 )+ 1 /■ 

P.| Js, 


IS.lJ 

rT [B -I- b) 

rT (a — ip'(b)b) 
rT [B -|- b) 


f r'^ {ip'{b)B + a) 
S„ r-r {B -P b) 

dr 

dr. 


dr 


We next use the identity 

rT (a — (/?'( 6 )b) r'^ {a — ip'{b)b) r'^b r'^ {a — p'{b)b) 


rT [B -f b) 


rT B 


rT B rT (B -b b) 


combining it with Lemma S.5 (since Q G Mt*), to obtain 
(S.15) 
where 

(S.M) E, := - ' f 


E.-ub''’’ =/t(<>)+e“‘“''“'>Pa (? + »*-*)'(I))|e + »#l) +Ei, 


1 

1^ 


dr. 


5 ^ rT B rT [B + b) 

Note that by the assumptions that ||e-|-fT $||2 < k and |rT -b e -b cr$) | > 

2 At, we have 

|rT6| ^ 1 

|rT {B + b)\ ^ 2 ’ 
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SO that 


\Ei\< 


IrT (a — tp'{b)b) 


\S, 


k| Js^ 


2|rTg| 


dr. 


Next, we apply the Cauchy-Schwarz inequality to this integral, together with Lemma 
IS.51 which leads to 


\EA< 


f dr 

1/2 

' f |rT (a-(/)'(5)b) |2' 

Js^ 


[Js. 2|rTQ|2 J 


-I 1/2 


1 

1^ 


1 /, . .., WPoia- ip'{b)b) 


< ^ {\PQ {a - (fi'ib)b) f + c 


12 \ 1/2 


J- 1 


1 - 


1 - 


J- 1 
c 

J-1 


(a - ff'{h)b) P + c 




J- 1 


I /- ^ |2 ||e + (7$ - (p'(6)[e + cr$]|| 

IPq (^e + - <f'{b)[e + a$]j 1^ + c ^^ 


1/2 


The variance can be evaluated in the same manner. Noting that for a random 
variable X, VarX = Var(X — c) for any constant c, we have 


(V-M) 


Var^Wy, 




[ 

r^ (a — (p'{b)b) 


rT (B + b) 

f 

rT (a — ip'[b)b) 


rT (B + 6) 


dr — 


dr 


1 


r'’' (a — (p'{b)b) 

rT [B + b) 


dr 


This last expression (S.17) can be bounded by 


2 r ( 

rT (a — (f'{b)b) 

2 

rTb rT (a — ip'{b)b) 

Js. [ 

rT B 


rT B rT (B + 6) 


dr 


< 


2 5 

^4, 


r^ (a — ip'{b)b) 


r'^ B 


dr . 


We have encountered this exact same integral before, and bounded it by invoking 
Lemma FS.51 We thus obtain 


_ 5 
“ 2 


1 - 


1 - 


J- 1 
c 

J- 1 


IPq {a-ip'{b)b) |2 +c 


\a-(f'{b)b\\ 
J -I 


IPq {e + (J^ - ip’{b)[e + (7$]^ P + c 


]e + (t€> — (p'{b)[e + cr^] 
J- 1 


□ 

This concludes this section concerning the details for the technical estimates in 
section 3 of the main paper. 
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ESM-4. Numerical results 


As described in the main paper, we consider both CWT and STFT-based Con¬ 
ceFT representations. In both cases, the orthogonal family of reference functions 
(wavelets for the CWT, windows for the STFT) are the eigenfunctions, up to a cer¬ 
tain order, of a time-frequency localization operator that is particularly well suited 
to the CWT or STFT framework [151 ESI EU El]- Figure S.l below shows the 


shape and size of TF domains of this type. In both cases, the shapes correspond 
to a two-parameter family, and the localization operators behave approximately 
like projection operators. More precisely, once the parameters A determining the 
shape are picked, there is a natural family of (commuting) operators and an 

orthonormal family of functions such that 

where the eigenvalues all between 0 and 1, constitute a strictly decreasing 

sequence, tending to 0 as j tends to oo; for fixed A and j, each increases 

with R, tending to 1 as i? (which indicates the size of the region characterized by A) 
tends to oo. The eigenfunctions themselves (which do not depend on R) are scaled 
and possibly chirped Hermite functions for the STFT case, and Morse functions in 
the CWT case. 



frequency 

frequency 


5 ^^- ••• 


time 





1 


eigenvalues 


0 



eigenvalues 


30 j 


0 


20 30 


Figure S.l. Localization domains in the TF plane for the refer¬ 
ence windows or CWT reference wavelets: Top: CWT, bottom: 
STFT. From left to right: different shapes of the TF domain, cor¬ 
responding to different parameter choices A; different sizes of one 
domain shape, corresponding (for one fixed A) to different R; eigen¬ 
values Ej^''^\ for different R. 


It seems natural to pick these special orthonormal families, since each family 
provides, in some sense (made precise in Ea ig EH EH) the “best” localization, 
simultaneously, by different orthonormal functions, for one shared time-frequency 
domain. (A similar reason underlies the choice, in standard multi-taper methods for 
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spectral estimation, of the prolate spheroidal wave functions for the taper functions 
[SHUHi].) However, the method does not depend on these particular choices, and 
it is not only conceivable, but indeed likely, that for particular applications, other 
choices may be more suitable and give better results. 


ESM-4a Data simulation. Figure S.2 below shows the graph of (the restriction 
to [15,40] of) another signal s* G C. This signal is used in the main paper to 
illustrate the action of ConceFT on a signal from C that has played no role in 
calibrating the ConceFT parameters (unlike s). 



Figure S.2. Another signal s* (in black) in C, and the corre¬ 
sponding instantaneous frequencies (in gray) of the two compo¬ 
nents, restricted to the time interval [15,40]. 


Figure S.3 plots a realization of Y*{t) = s*{t) + cr^{t) for each of the three 
noise processes (Gaussian, ARMA(1,1) and Poisson), restricted to the subinterval 
[15,40]. 



Figure S.3. The restrictions to [15,40] of the noisy signal Y* = 
s* + (J^ (2nd row to bottom), where s* is the clean signal from 
the previous figure (plotted again in the top row) and where the 
added noise is Gaussian, ARM A, or Poisson noise (in order, from); 
in each case a is picked so that the noisy signal has 0 dB SNR 
(signal to noise ratio). The four plots are at the same vertical scale. 


ESM-4b Performance evaluation. To assess the performance of ConceFT, we 
must compare the time-varying Power Spectrum (tvPS) Pv, as estimated via Con¬ 
ceFT, with the ideal time-varying power spectrum (itvPS) of the clean simulated 
signal s, defined (in a natural interpretation of its construction procedure) as 

2 

Ps(iiw) := 'y ] ■^k{t)dip\_(t){u>)- 
k=l 
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Viewing both the itvPS and the tvPS as distributions on the TF-plane, we want to 
assess, in particular, whether the regions in the TF plane where they each concen¬ 
trate, coincide or lie close to each other. The Optimal Transport (OT) distance (also 
called the Earth Mover distance) is a distance that is designed to do this: given 
two probability measures on the same set, their OT-distance gives the amount of 
“work” needed to “deform” one into the other. A bit more precisely, it computes the 
total (integral/sum of the product) mass x distance traveled for the transformation 
(i.e. the transportation plan that minimizes this quantity) that maps one to the 
other. Because the principle of ConceFT is to “reassign” content in the TF plane, 
keeping the time-variable fixed (see Section 2), we also compute the OT-distance 
for each individual t (keeping t fixed), and then take the average over all t G [0, T], 
This has a fortuitous advantage, in that it reduces the OT-distance computations to 
1 -dimensional problems, for which there exists a computational short-cut: the stan¬ 
dard definition for the OT-distance between probability distributions p, and zz on a 
metric space (S, d) involves an optimization over V{pi,v), the set of all probability 
measures on S x S that have p and v as marginals, 

doT{p,v) ■■= inf / d{x,y)dp{x,y) , 


which can be computationally quite expensive. In the one-dimensional case (i.e. 
when S C K, and d is the canonical Euclidean distance, d{x, y) = \x — y\), however, 
it turns out (see e.g. section 2.2 in EHl) that, defining /^(x) = dp (analogously 
for /j^), we have 

doT{l^.v)= \f^[x) - f^{x)\dx . 

The OT-distance is defined for probability distributions, and it is by no means 
guaranteed that the positive functions Py(t, •) and Ps(t, •) have integral 1 for all t\ 
for this reason, we normalize them before computing their OT-distance. We may 
also want to capture (and penalize in the distance metric) possible differences in 
the total weights of Pv(t, •) and Ps(t, ■)', we can introduce a term for this as well. 
More precisely, assuming that the frequency domain over which Py and Pg range is 
[0, n], and assuming also that /p /p Py(t,w)dwdt = /p Ps(t, w) dw dt (which 
can be achieved by multiplying Py with a constant, if necessary), we define 


Da(Py 


P«) 


= a 


PY(t,uj) = / Pr(t,Od^ 

do 

PY(t,Uj) = PY(t,Uj)/pY(t, fl) 

T Jo PY{t,n) -\-ps{t,n) 


Ps{t,uj)= / Pg(t,^)d^ 

^0 

Ps{t,Uj) = Ps(t,0j)/ps{t,U) 

1 ^ 

(1-a);^ / / \pY{t,oj) - ps{t,uj)\ dujdt 

^ Jo Jo 


In practice, we picked a = 0 in our evaluations, since the corresponding pure OT 
distance already gave us a reasonable way to quantify how well a tvPS reflected 
“its” itvPS, consistent with our (subjective) appraisals. In concrete computations, 
the integrals are approximated by sums of the corresponding discretized quantities. 


ESM-4c Parameter Selection. In this subsection, we report the details of our 
exploration of the parameter space, leading to our choice of /3 = 30, 7 = 9 and 
J = 2 as the optimal one for the CWT-based ConceFT algorithm, when applied to 
the signal class C. 
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We applied ConceFT to the noisy signals, with 7 = 3,4, ••• ,10 (8 choices); 
/? = 20, 30, • • • , 70 (6 choices) and J = 1, 2, 3,4 (4 choices). All 192 possible com¬ 
binations of these options are investigated. For each example and each parameter 
setting, we applied the ConceFT algorithm 10 times, each time with 10 random 
projections; the average of the OT distances over these 10 attempts was then com¬ 
puted. 

Figure [S^ visualizes the results by means of a “heat map”. In this figure, the x- 
axis is the selection of 7 and (3, the y-axis is J and the color at each entry represents 
the averaged OT distance for the corresponding choice of the parameters 7, (3 and 
J; the lighter the color, the smaller the OT distance and hence the better the 
performance. The Figure shows the averaged OT distance of the ConceFT result 
for all choices of parameters, for one signal in C, and three types of noise, giving 
three heat maps in total. The cc-coordinate in each heat map cycles through the 6 
values of /3 before it moves on to a new value of 7; Table [STT| below gives the value 
of X for each pair of Morse parameters considered. 



7 = 3 

7 = 4 

7 = 5 

7 = 6 

7 = 7 

7 = 8 

7 = 9 

7 = 10 

o 

II 

1 

7 

13 

19 

25 

31 

37 

43 

= 30 

2 

8 

14 

20 

26 

32 

38 

44 

o 

II 

3 

9 

15 

21 

27 

33 

38 

45 

13 = 50 

4 

10 

16 

22 

28 

34 

39 

46 

/3 = 60 

5 

11 

17 

23 

29 

35 

40 

47 

II 

o 

6 

12 

18 

24 

30 

36 

41 

48 


Table S.l. The numbers on the x axis and their corresponding 
Morse parameters. 



1 7 13 19 25 31 37 43 1 7 13 19 25 31 37 43 1 7 13 1 9 25 31 37 43 

the Morse parameters the Morse parameters the Morse parameters 


Figure S.4. Exploring the parameter space: heat maps visu¬ 
alizing the OT distance between the itvPS for a clean signal and 
the ConceFT-based tvPS of noisy versions, with SNR of 0 dB, for 
a two-component signal in C, and for three different types of 
additive noise: Gaussian (left), ARMA(1,1) (middle) and Poisson 
(right). Each heat map shows the results for the 192 different pa¬ 
rameter combinations described in the text. The color of each box 
represents the OT distance: lighter colors indicate better perfor¬ 
mance. 

When we computed similar heat maps for other randomly picked signals in C, 
the results were virtually identical. Our exploration showed that the combination 
/3 = 30, 7 = 9, J = 2 lead to the best performance; we thus chose these values for 
the remainder of the paper. 
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ESM-4e. ConceFT results for noisy signals. Figure [ST5| is the analog of Figure 
8 in the main paper, for s, the signal used to calibrate the parameter N for the 
ConceFT algorithm rather than the “new” signal s*. 



Figure S.5. OT distance of CWT-based ConceFT results 
against signal to noise ratio (SNR) for the signal s{t), and compar¬ 
ison with standard SST (with respect to the lowest-order Morse 
wavelet in black, and a random combination of the first two Morse 
wavelets in green) and standard multi-taper SST, both also CWT- 
based (see text). Noise type (left to right): Gaussian, ARMA(1,1), 
and Poisson. The ConceFT result is the mean OT-distance for 
20 independent ConceFT computations; the standard deviation is 
smaller than the size of the marker. 


For each type of noise (Gaussian, ARMA(1,1) or Poisson) and each SNR consid¬ 
ered (i.e. X dB, where x G {—7, —6 ,..., 6, 7}), 20 independent realizations of the 
noise process are considered; for each of the resulting noisy signals the ConceFT 
analysis and the OT-distance of the resulting tvPS to the itvPS of the clean signal 
are computed; the mean and the standard deviation for each are shown in Figure 
ESM-4.6. 

Figure S.5 also compares the ConceFT results with those of simple SST (using 
either the first Morse wavelet with parameters /3 = 30, 7 = 9 as reference wavelet, 
or one random linear combination of the two first Morse wavelets) and of multi¬ 
taper SST (denoted as orgMT), using the same ijjj as ConceFT. For each of these 
alternate methods, we likewise computed the mean OT-distance of the tvPS to the 
itvPS for 20 noise realizations. (Note that the results are very similar to those in 
Figure 8 in the main paper, for s*.) 


ESM-4.f ConceFT with STFT. As a complement to Figure 2 of the main paper, 
which shows STFT-based ConceFT results and compares them with other SST- 
based algorithms (simple STFT-based SST with a Gaussian window, or multi-taper 
SST), we show below the results of STFT-based ConceFT for the same signals s 
and s* for which the main paper showed, in Figure 7, CWT-based ConceFT results. 
Before we do this, we give some more details about how these STFT-based ConceFT 
results are obtained. 

We explain in the main paper that SST can be defined starting from a STFT 
just as well as from a CWT. The whole ConceFT analysis, theoretical as well as 
numerical, can be carried out equally well using such STFT-SST representations. 
At the start of Section ESM-4, above, we explained the rationale for choosing Morse 
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functions for the ijjj ; this rationale leads similarly to the choice of Hermite functions 
as the natural basis windows hj for STFT-based ConceFT. As in the CWT case, 
the choice of the family is completely fixed by the values of two parameters; in the 
STFT case these correspond to the eccentricity of the elliptic localization domain 
in the TF-plane and the tilt of the major axis of this ellipse with the time-axis (see 
Figure S.l bottom-left). Since there is no a priori reason to expect that chirping 
the Hermite functions in any direction (which tilting the elliptic domain would lead 
to) gives any advantage, we left this parameter out of consideration. The remaining 
parameter then corresponds to scaling the Hermite functions. 

In analogy with the analysis in subsection EMS-4c, we thus explored the OT- 
distance of the STFT-based ConceFT tvPS of signals in C to their itvPS, for dif¬ 
ferent rescalings and different numbers J of Hermite functions. Minimizing this 
OT-distance led us to picking Hermite functions for which the underlying Gaussian 
function was scaled so that the bandwidth h = 5/16; that is, the Gaussian function 
is (when measured in samples, since the sampling rate is 160Hz, 


this corresponds to an effective width of 600 samples, or 3.75 sec, for the window 
functions hj)] the optimal number J of functions was 4. We then kept these pa¬ 
rameter choices for our further experiments. The number N of randomly picked 
linear combinations of the window functions was taken to be 20, as in the CWT 
case. 

Once all the parameters are fixed, we can use the calibrated STFT-based Con¬ 
ceFT approach to study noisy versions of signals in C. To compress dynamical 
range of the tvPS plots, we use the same trick as for Figure 7 of the main paper: 
we first reduce all the tvPS to the same total “energy”, by multiplying each dis¬ 
cretized tvPS Pv G with an appropriate constant so that the “mean energy” 

of all entries equals the same number for all subfigures; that is, for some 0 > 0 
so that ;;^Er=iELi (^y)ki = We take 0 = 5, as in the main paper, for 
Figures 7 and 9. Then we plot R G rather than Py G itself, where 

Rk,i '■= log(l-l-min{(Py)fc_;, g}), k = 1,... ,m, I = 1,... ,n and g is the same cut-off 
as used in the main paper in Figures 7 and 9 (ensuring that the gray-scale value 
plots are all comparable). 

Figure [S)6| shows the results for STFT-based SST, with a Gaussian window with 
the bandwidth h = 5/16, on the (discretized) signal V(tk) = s(tk) + cr^k, where the 

are i.i.d. realizations of a Gaussian noise process, and cr is chosen so that the 
SNR equals 0 dB; next, it shows the tvPS obtained with multi-taper SST, using 
the first 6 Hermite functions as window tapers (note that the Gaussian used for 
the simple SST is included among these; it is the lowest-order Hermite function); 
finally it shows the result of GonceFT, averaging SST results using 20 random linear 
combinations of those same 6 Hermite functions. In all three cases the OT-distance 
to the itvPS is given as well. Figure [S?7| is entirely similar, but now for the signal 
s* rather than s. 

For Figures 7 and 9 in the main paper, and Figures |S.6| and |S.7[ we used the 
Matlab command imagesc to generate the plots of the different tvPS Py, in the 
form image sc (log [l -P Py] , [0 log(l -Pg)]), where g > 0; the two-entry array 
[0 log(l + g)] in this expression ensures that the gray scale value at point (t,^) 
in the plot is linearly proportional to the value of log [l -P max (Py(C'C):'?)]: with 
white standing for 0 and black for g. The same value of g is used for all the tvPS 
plots. Plotting log(l -P Py) rather than Py itself makes it possible to display a 
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wider dynamical range; fixing the full gray-scale range to cover exactly [0, log(l -I- 
q) ] in each plot ensures that the figures present a fair visual comparison of the 
different tvPS. The maximum q also functions as a saturation cut-off: all values 
of Py(t,5) exceeding q are rendered as black in the plots, regardless of the excess. 
The numerical value of q was picked so that the saturation cut-off is active on only 
an exceedingly low number of outliers. 

Note that we have systematically normalized the Py to have the same mean, 
which we picked to be 5 here. This value is not completely arbitrary: we picked 
it so that the dynamical-range compressing function log(l -P Py) makes the noise 
artifacts clearly visible. Different values are possible, and the choice depends on 
the applications; different types of signals and different desired visualization char¬ 
acteristics, typically correspond to different choices for 9. 

The other issue is the determination of a “natural” or “good” value for the 
cut-off q. For the purposes of this paper, where we wanted to give a visualiza¬ 
tion of the goodness-of-fit to the itvPS of a signal s of the tvPS for different noisy 
versions Y (obtained by adding to s different types of noise, possibly also of dif¬ 
ferent strength), and compare these for different analysis methods and different 
noise types/strengths, it is most natural to fix a uniform value of q (after uniform 
normalization of the Py). When ConceFT is used in practice, however, we expect 
to use one particular analysis method, to have at hand only one realization of the 
(unknown) noise process, and (of course) not to have a ground truth with which 
to compare. An important role of q, for the rescaling used by imagesc for the 
visual display, is to downplay an otherwise exaggerated impact from outliers. To 
determine q, based only on Py itself, it would thus be natural to choose it as some 
fixed perc entile of the distrib ution of values of Py. 

Figures S.8 through S.IO show the same Py as also plotted in Figure 7 in the 


main paper (for CWT) and Figures S.6 and S.7 (for STFT), but with a plotting 


scheme that corresponds more to the realistic signal analysis situation, where we 
only have the signal at hand. More precisely, for Figures |S.8| through |S.10| we 
do not normalize the tvPS to have the same total “energy”, and we determine 


the cut-off q for each Py individually; we plot R £ 


which is defined as 


Rk,i ■= log(l + min{(Py)fc_;, g}), k = 1,..., m, I = 1,..., n, where for each plot, 
the value q is given by the 99.8% percentile of only the Py of the transform/data 
for that individual plot itself. The difference between the two cases is striking, 
especially for the STFT figures. This suggests a more thorough exploration would 
be useful of how to optimally pick q depending on noise and signal structure and 
on analysis method chosen; this is beyond the scope of this paper, however. 
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Figure S.6. STFT-based results for noisy versions of the same 
signal s as in Figures 3, 4 and the top half of Figure 7 in the main 
paper; the noisy versions considered are also the same realizations 
as in Figure 4 in the main paper. Top: itvPS of the clean signal 
s; next three columns: the tvPS of Y with different noises. Each 
column corresponds to one algorithm; each row to one noise type. 
Noise types: from top to bottom, in order: Gaussian, ARMA(1,1) 
and Poisson noise, in each case with SNR of 0 dB. Different ap¬ 
proaches: from left to right, in order: SST using a Gaussian win¬ 
dow with the bandwidth h = 5/16; STFT-based multi-taper-SST 
(using the top 6 Hermite functions: the same Gaussian again, and 
the next 5 Hermite functions); STFT-based ConceFT (using 20 
random combinations of the top 4 Hermite functions). 
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Figure S.7. STFT-based results for noisy versions of the same 
signal s* as in Figures S.2 S.3 and the bottom half of Figure 7 in 


the main paper; the noisy versions considered are also the same re¬ 
alizations as in Figure [ST3| Top: itvPS of the clean signal s*; next 
three columns: the tvPS of Y* with different noises. Each column 
corresponds to one algorithm; each row to one noise type. Noise 
types: from top to bottom, in order: Gaussian, ARMA(1,1) and 
Poisson noise, in each case with SNR of 0 dB. Different approaches: 
from left to right, in order: SST using a Gaussian window with the 
bandwidth h = 5/16; STFT-based multi-taper-SST (using the top 
6 Hermite functions: the same Gaussian again, and the next 5 
Hermite functions); STFT-based GonceFT (using 20 random com¬ 
binations of the top 4 Hermite functions). 
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Figure S.8. This figure contains CWT-based ConceFT of the 
same signal s as in Figure 7 in the main paper, but with a different 
truncation plotting strategy (see text). First row: results for the 
signal s; second row: results for the signal s*. Left to right: ideal 
time-varying TF power spectrum (itvPS) for the clean signal, fol¬ 
lowed by results of ConceFT with Morse wavelets after (in order) 
Gaussian, ARM A (1,1) or Poisson noise was added, with SNR of 0 
dB. The figures are plotted with q chosen to be the 99.8% quan¬ 
tile of each figure, and without normalizing Py. For each of the 
tvPS panels, the header gives the OT distance to the corresponding 
itvPS, which is the same as before, since the Py are the same. 
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Figure S.9. STFT-based ConceFT results for noisy versions of 
the same signal s as in Figures |S.6| and the top half of Figure 7 
in the main paper; the noisy versions considered are also the same 
realizations as in Figures 3 and 4 in the main paper. The plot¬ 
ting strategy is different, however (see text). Top: itvPS of the 
clean signal s; next three columns: the tvPS of Y with different 
noise types. Each column corresponds to one algorithm; each row 
to one noise type. Noise types: from top to bottom, in order: 
Gaussian, ARMA(1,1) and Poisson noise, in each case with SNR 
of 0 dB. Different approaches: from left to right, in order: SST 
using a Gaussian window with the bandwidth h = 5/16; STFT- 
based multi-taper-SST (using the top 6 Hermite functions: the 
same Gaussian again, and the next 5 Hermite functions); STFT- 
based ConceFT (using 20 random combinations of the top 4 Her¬ 
mite functions). The figures are plotted with q chosen to be the 
99.8% quantile of each figure, and without normalizing Py. For 
each of the tvPS panels, the header gives the OT distance to the 
corresponding itvPS, which is the same as before, since the Py are 
the same. 
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Figure S.IO. STFT-based ConceFT results for noisy versions of 
the same signal s* as in Figures [S?7| and the bottom half of Figure 7 
in the main paper; the noisy versions considered are also the same 
realizations as in Figures [S^ and [S31 The plotting strategy is dif¬ 
ferent, however (see text). Top: itvPS of the clean signal s*; next 
three columns: the tvPS of Y* with different noises. Each column 
corresponds to one algorithm; each row to one noise type. Noise 
types: from top to bottom, in order: Gaussian, ARMA(1,1) and 
Poisson noise, in each case with SNR of 0 dB. Different approaches: 
from left to right, in order: SST using a Gaussian window with the 
bandwidth h = 5/16; STFT-based multi-taper-SST (using the top 
6 Hermite functions: the same Gaussian again, and the next 5 
Hermite functions); STFT-based GonceFT (using 20 random com¬ 
binations of the top 4 Hermite functions). The figures are plotted 
with q chosen to be the 99.8% quantile of each figure, and with¬ 
out normalizing Py. For each of the tvPS panels, the header gives 
the OT distance to the corresponding itvPS, which is the same as 
before, since the Py are the same. 












